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Abstract 

We consider interpolation of univariate functions on arbitrary sets of nodes by 
Gaussian radial basis functions or by exponential functions. We derive closed-form 
expressions for the interpolation error based on the Harish-Chandra-Itzykson-Zuber 
formula. We then prove the exponential convergence of interpolation for functions an- 
alytic in a sufficiently large domain. As an application, we prove the global exponential 
convergence of optimization by expected improvement for such functions. 
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1 Introduction 

In this paper we consider univariate interpolation by Gaussian radial basis functions (RBF) 
and the closely related interpolation by exponential functions. 

RBF interpolation is widely used in applications due to its simplicity and ability to 



handle generic scattered multidimensional data (2, 15 . Our interest in RBF interpolation 
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is motivated by its role in the optimization by expected improvement, as this interpola- 
tion determines the mean of a stationary isotropic Gaussian field after conditioning on a 



finite number of measurements 10 . We restrict our attention to the Gaussian (squared- 
exponential) RBF, which is one of the most popular examples of analytic RBFs. In [16] 
we proved that optimization by expected improvement with the corresponding correlation 
function may be inconsistent for infinitely smooth functions. One of the goals of the present 
paper is to rule out this inconsistency for functions analytic in a sufficiently large domain. 
The main ingredient in the proof is a convergence result for RBF interpolation. 

Convergence of RBF interpolation has been studied extensively in recent years [2 15 



General convergence results are most naturally stated for interpolated functions from "na- 
tive" spaces associated with the considered RBF. For a class of multivariate RBFs including 
the Gaussian, a strong theorem of this type has been proved by Madych and Nelson [11]. 
This theorem, however, is not sufficient for our purposes, for two reasons. Firstly, the native 
space for the Gaussian RBF is very narrow (in particular, all functions from this space are en- 
tire and vanish at infinity for real values of the arguments). Secondly, and more importantly, 
this theorem establishes convergence under assumption of an asymptotically dense filling of 
the design space by the sequence of interpolation nodes. While it is a standard assumption 
for interpolation when considered as a stand-alone procedure, it may not hold in the context 
of optimization, where the nodes are determined by the optimization algorithm rather than 
freely prescribed in advance. For analytic RBFs and interpolated functions, however, one 
can expect the dense filling to be an excessive requirement due to the non-locality of analytic 
dependencies. 

We therefore turn our attention to the univariate case, which is significantly simpler 
than the multivariate case, and where one can hope to obtain much more complete results, 
especially for the Gaussian covariance function. Indeed, it is known that in the limit of 
increasingly flat rescaled Gaussian RBFs the univariate RBF interpolation is equivalent to 
the polynomial interpolation [5]. In 12 Platte and Driscoll have established, by a change of 
variables, a relation between polynomial interpolation and interpolation by Gaussian RBFs 
for sets of equally spaced nodes. 

There also exists a very simple relation between interpolation by Gaussian RBFs and 
interpolation by linear combinations of simple exponential functions (sometimes called ex- 
ponential ridge functions in the multivariate setting). In the context of multivariate in- 
terpolation this relation appeared, in particular, in the work of Schaback [14] connecting 
interpolation by Gaussian RBF to the "least" polynomial interpolation of de Boor and Ron. 



See also the work of Zwicknagl 17 , where interpolation by exponential functions is consid- 
ered as an example of a general class of interpolations based on power series kernels. 

In this paper we establish a further connection between univariate interpolation by poly- 
nomials, Gaussian RBFs and exponential functions, by deriving in the latter two cases general 
error formulas analogous to the well-known error formula for the polynomial interpolation. 
These formulas are based on the Harish-Chandra-Itzykson-Zuber integral [7|j9], which was 
earlier used by Bos and De Marchi [I] to determine the distribution of nodes maximizing 
the determinant of the Gaussian RBF interpolation matrix. Our error formulas are valid for 
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arbitrary ID sets of nodes. They are derived in Section [2j 

In Section [3] we use these formulas to prove convergence of interpolations of analytic 
functions by exponentials or by Gaussian RBFs for generic infinite sequences of nodes. In 
particular, this result does not require the nodes to densely fill the design space. 

Finally, in Section [4] we prove the global exponential convergence of optimization by 
expected improvement for analytic functions as a straightforward application of the interpo- 
lation convergence theorem. 

2 Closed-form interpolation error formulas 

We consider linear interpolation of univariate functions by linear combinations of given basis 
functions fi, fi-, ■ ■ ■■ Given a set of distinct nodes X\, . . . , x n £ R and a function /, we define 
the interpolant If by 

n 
k=l 

where the coefficients c k are chosen so that 

If(xi) = f(xi), Z = 1, ... ,n. 

We will occasionally write the operator I as I{x k y n _ 1 or I n to emphasize the dependence on 
the nodes or their number. 

A particular type of interpolation is specified by the choice of basis functions fk- We will 
consider the following types: 

• Interpolation by Gaussian RBF (denoted I s or If Xk \n ) corresponds to Gaussians cen- 
tered at the interpolation nodes x k - 

f k (x) = e -<*-**> a /2. 

• For any distinct values ti,...,t n £ R, interpolation by exponential functions (denoted 
I e or If , i„ ) corresponds to 

fk{x) = e tkX . 

• Polynomial interpolation (J p or If Xk yn ) corresponds to 

f k (x) =x k ~ l . 

For the interpolation / to be well-defined, the matrix (fk{%i))k i=i m ust be nondegenerate. 
This is so for the above three types: for I s this follows e.g. from the positive definiteness of 
the function e~ x I 2 ; for I p this follows from the nondegeneracy of the Vandermonde matrix; 
for I e this follows e.g. from the arguments below. 
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Gaussian interpolation If Xk yn reduces to exponential interpolation I^ Xk (i.e., with 

tk = Xk) by noting that 



e -(x-x k ) 2 /2 = e -x 2 /2 e x k x e -x 2 k /2_ 

Indeed, thanks to this identity we can write 

n n 
k=l k=l 

where f(x) = e x2 ^ 2 f(x). In other words, the interpolation operators are related by the 
identity 

where e ±x2 ^ 2 is the operator of multiplication by the function e ±x2//2 . 

This argument shows in particular that the interpolation If t i„ is well-defined, i.e. 
its interpolation matrix 

71 — l e )k,m=l 

is invertible, at least if = Xk- In fact, this interpolation is well-defined for any sets of 
distinct values t%, . . . ,t n and distinct nodes Xi, . . . ,x n . One way to see this is to use the 
remarkable formula of Harish-Chandra-Itzykson-Zuber (HCIZ). To introduce this formula, 
we need a few definitions. Consider the diagonal matrices 

X = diag(xi, . . . , x n ), T = diag(t 1; ...,t n ). 

Let V(X) denote the Vandermonde determinant for the points x±, . . . , x n : 



V(X)=det(xf) i< k <n = II (^-^)- 



0<m<n-l i<fc</< n 

Finally, define the constant (3 n by 

n-l 

Pn = II kl 

k=0 

Then the HCIZ formula reads [7H9l : 



&etA = (3- l V(X)V(T) [ e tr{TUtxu) dU } (2) 

JV(n) 

where integration is over Haar measure on the group U(n) of unitary matrices of size n. Here 
and in the sequel by t we denote the Hermitian conjugate. 

Note that the integrand in ^ is strictly positive. Since V(X) ^ and V(T) ^ for 
distinct ti, . . . , t n and x±, . . . , x n , it follows in particular that det A ^ 0, as stated above. 

The main results of this section are HCIZ-integral-based error formulas for the interpo- 
lations I g and I e . 

We first consider the I e case. We introduce some additional notation: 
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• Let Z{ Xk tk } n _ = Jxsin) e tT ^ TU xu ^dU > be the integral appearing in the HCIZ formula 
<§■ 

• Let f S 2n+i ■ dv denote integration over the normalized Lebesgue measure on the unit 
sphere S 2n+1 = {v : |v| = 1} in C n+1 . 

• Let X be the extension of the diagonal matrix X by the value x: 

X = diag(x,xi, . . .,x n ). 

• Let conv(X) C R be the convex hull of points x, Xi, . . . , x n . 

• Let P v : C n — > C n+1 be any isometry between C n and the orthogonal complement to 
the vector v G S 2n+1 in C n+1 ; P v is assumed to depend measurably on v. 

Theorem 1. For any f G C n (conv(X)) ; 
/(*) - I{ Xk ,t k }" k=1 f(x) =^ft^ 



{x k ,t k r k=1 

f f e tr(TU^P±XP v U) [ TT (— - t fc 

Js 2 "+ 1 Jv(n) L,T{^dq 



k- 



/(g) 



_ dvdU. (3) 

q=vtXv 



Proof. We first prove formula ^ for functions of the form f(x) = e tx with any t G C, and 
then extend it to all / G C n (conv(X)). 

It will be convenient in the following to consider t and x as elements extending the 
sequences t\, . . . ,t n and x±, . . . ,x n , respectively, by identifying 

to = t, Xq = X. 

We begin by recalling the following classical result from the general theory of linear 
interpolation: 

Lemma 1 (see e.g. Theorem 3.8.1 in |4|). Let I be any linear interpolation with distinct 
nodes xi, . . . ,x n and basis functions fx,. . . , f n . Then, assuming det(fk(x m ))^ m=1 ^ 0, the 
error of interpolation of a function fo is given by 



f (x ) - If (x ) 



det(f k (x m m m=0 



det(/ fc (x m ))Jf )Tn=1 
As a consequence, 

det(e^)? m= ,, 



fW-^t^Jfr) 



det(e^)£ m=1 

We apply the HCIZ formula to both numerator and denominator and obtain 

/(*) - = ^^"^^'^ L , e^^dif, 

1 lfe=1 nlZ {x k ,t k }" Mn+l) 



where integration is performed over unitary matrices of size n + 1, and 



X = diag(x, x 1 , . . . , x n ), T = diag(t, t 1 ,...,t n ). 

Let us write the (n+l)-dimensional trace in this formula as the sum of the part corresponding 
to the first entry of the matrix T and the remaining n-dimensional trace. To this end, denote 
by v the first column of the matrix U, and by U 1 denote the remaining (n+1) x n sub- matrix. 
Then we can write 

ti(fU ] XU) = ti(TU' ] XU') + tv*Xv. 

We can replace integration over U by double integration, first over v G s 2n+1 and then over 
the complementary matrices U'. It is convenient to fix for each given v one complementary 
matrix P v , and then make the substitution 



U' = P V U, 

where U G U(n). In this way we reduce integration over U(n + 1) to integration over 
S 2n+1 x U(n). The resulting measure of integration is the product of the normalized Lebesgue 
measure on the sphere with Haar measure on U(n). As a result, we get 



f f e tr(TU^P^P v U) e ^X V fr (f _ t)dv(]u 

J S 2n + l J V(n) AA 



Since 



n j 



k=l 



dq 



the proof is complete for f(x) = e tx . 

It remains to extend formula ^ to all functions / G C n (conv(X)). This can be done by 
standard arguments, using the formula's linearity. First note that the formula holds for all 
polynomials, by multiply differentiating it written for f(x) = e tx with respect to t at t — 0. 
Then, for any / G C n (conv(X)), apply the Weierstrass theorem to to show that for 
any e there is a polynomial p such that \f <yk \x') — p( k \x')\ < e for all x' G conv(X) and all 
derivatives k — 0, 1, . . . , n. □ 

Remark. Formula ^ leaves some freedom for the choice of P v . One natural choice is the 
one diagonalizing the matrix P^XP V and placing its eigenvalues according to the order of 
eigenvalues in X; see the proof of Theorem [2] in the next section. 

It follows from ([!]) that the interpolation errors for I g and P are simply related by 
Theorem 1 then immediately implies an error formula for the Gaussian interpolation: 
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Corollary 1. For any f £ C n (conv(X)), 



x 



/ / e tr(Jft^i»txp v io e -* 9 /2 [ fr ( ± _ Xk 



If- 



g»/2 



_ dvdU. 

q=v'f Xv 



Theorem [T] also allows us to show that the exponential interpolation converges to the 
polynomial one if t k — > for all k, by viewing error formula ^ as a generalization of the 
classical error formula for polynomial interpolation. 

Corollary 2. For any f E C n (conv(X)) ; 

lim If t -,„ f(x) = If Xl xn f(x). 

Proof. Recall the well-known error expression for the polynomial interpolation based on the 
Hermite-Genocchi formula for divided differences: 

/(*) - % k }l A x ) = U ^ l{X ' Xk) I f^(±s k x k )ds, (5) 

fb- J&n k=Q 

where s = (so, si, . . . s n ) and the integration is over normalized Lebesgue measure on the 
n-dimensional simplex A n = {s : J2k=o s k = 1; s k > 0}. 

To prove the corollary, we use error formulas ^ and ^ to show that 

Indeed, first observe that in this limit the differential operator on the r.h.s. of ^ tends 
to d n /dq n , Z{ Xkjtk }n tends to 1, and the exponential factor in the integrand tends to 1 so 
that the dependence on U vanishes making integration over U(n) trivial: 

To see how the integration over S 2n+l transforms, write v = (ao + ibo, . . . ,a n + ib n ) and 



substitute a k = y^cos = sin 0^ for each /c; the Jacobian of this substitution equals 
2 n . Then, using Dirac's delta 6 and the identity <J(|v|-l) = 2<5((|v|-l)(|v| + l)) = 2<5(|v| 2 -l), 



5yjU /W(vtyv)i(|v|, - 1) *' 



(27T) 

n 



—J 2+2 f (n) (£(«*+ W 5>* + ^) 2 - 1 n ***** 

JR " V k=0 J V fc=0 y fc=0 

If / n \ n 

j2nY^ h o< Sk r / (B) (£«^M£L a *-On*^* 

\0<</> fc <27r/ fc=0 V fc=0 7 k=0 

nI L, f {n \EskXk)s(j:i sk-i)udsk 

„ n 

/ / (n) (£W)ds. 



fc=0 



□ 



3 Convergence of interpolation for analytic functions 

In this section we use Theorem [T]to prove convergence of interpolation on a bounded segment 
[a, b] C R for functions / analytic in a sufficiently large complex domain DcC. We state 
our convergence theorem simultaneously for all three types of interpolation appearing in the 
previous section, thus emphasizing the similarity between them. 

Theorem 2. Suppose f is analytic in a complex domain T> D [a, b], and dist([a, b], &D) > 
p > 0. Let I denote any of the interpolations J e ,J s , J p for a sequence of distinct nodes 
Xi,X2, . . . C [a, b}; in the case of I e assume additionally that there exists R such that \tk\ < R 
for all k. 
Then 

sup \f(x)-I n f(x)\<c(—Y (6) 

x<=[a,b] K P ' 

with some constant c = c(f, a, b, p, R). 

Proof. We give the proof only for J e ; the result for I e then follows immediately from relation 
(Q, while the result for the polynomial interpolation can be seen as a trivial special case 
thanks to Corollary [2] of Theorem [T] 

It is convenient to represent the error f(x) — I*f(x) in the form 

ft \ _ T e f ( \ is^+i / u(n) K n (x, v, U)(j> n {x, v)dvdU 



J v(n) K n (U)dU 



where 



, / n. nfc=i( x x k) 

> n Uc, v) = - 



/(?) 



q=viX n v 



(7) 



K n (x,v,U)=e^ T " uipix ^ u \ 
K n (U)=e t ^ lJ]x - u \ 
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Lemma 2. Under assumptions of Theorem^ 



11/ \ i /b — a\n 

SUp \(f>n{x, V)| < Ci( 

se[a,b],ve5 2 ™+ 1 v P ' 

wY/i some constant c\ = ci(f, a, b, p, R). 

Proof. Choose a contour 7 C T> enclosing the segment [a, 6] so that 



min \z — q\ > p. 

z£y,q£[a,b] 



We use Cauchy's formula for f(q), 



/(?) 



1 r f(z)dz 



2m Jy z — q 

and substitute it in (J7j). Expanding the product of first order differential operators, 



1 



z — q 



1 " M (n-s)\R s 1 " p s R s < e pR 
~ n\ ~ \s) \z - q\ n ~ s + l ~ p n + l ^ s! ~ p n+] 



Since |x — xJ < b — a for all A;, it follows that 



P R 1 r 

|0 n (x,v)| < (b- aT—,- l\f{z)\\dz\ 



which implies lemma's claim with 



Cl 



2irp 



\f(*)\\dz\ 



At this point we need to specify the choice of P v . Let a be a permutation of x\, 
in the increasing order: 

Choose P v so that P^X n P v = diag(5i, . . . , x n ), where 

2<t(1) < ■ ■ ■ < X<r(n)- 

Lemma 3. With the above choice of P v , 



^R{a-b) 



< 



K n (x,v,U) 
K n (U) 



< e 



R(b-a) 



for any x G [a,b], v G S + and U G U(n). 



Proof. We have 



K n (x,v, U) 



K n (U) 



= \ tr(UT n U\PlX n P v - X n ))\ 

< \\T n \\tr\PlX n P v -X n \ 
<Rtr\PlX n P v -X n \, 



where we have used the well-known inequality tr(BC) < \\B\\ tr \C\ with \C\ = (C^C) 1 / 2 . 
Thanks to our choice of P v , P^X n P v — X n is a diagonal operator, and 

n 

tr \PlX n P v - X n \ = \%<r(k) - x*(k)\- 
fc=l 

It remains to observe that 

n 

\*oQt) ~ x <*{k) \<b-a. (9) 

fe=i 

Recall that the eigenvalues of a restriction of a quadratic form to a subspace of co-dimension 
1 alternate with the original eigenvalues; in particular the eigenvalues of PlX n P v alternate 
with the eigenvalues of X n . From this and from our convention on the order of eigenvalues 
it is easy to see that the open intervals {(x a (k), %a(k))}k=i d° n °t overlap. For example, if 
x G [min fe=lj ,. in Xfc, max fc=lj _ jn Xfc], then there exists n such that x CT ( no ) < x < av( no ) + i, and 
we can write 

x a(l) ^ x ct(1) 5: • • • 5: X a ( nQ ) < X a (^ no ^ < X < X^^^i < a; CT ( no ) + i < . . . < < £<7(n), 

which makes the absence of overlapping clear. The other cases, x < minfc =1] n and 
x > max/ c= i i ... jn Xk, are considered similarly. Since all the intervals {{x a (k), Xa(k))}k=i a ^ the 
same time lie in [a, b], we conclude (J9|. □ 

The claim ^ of the theorem now follows immediately from Lemma [2] and the upper 
bound in ([8]), with 

Jp+b-a)R f 
c=e R(>-a) Ci= f \f{z)\\dz\. 

□ 

The above theorem proves convergence only if the analyticity domain of the interpolated 
function is sufficiently large. It is known that if the domain is not large enough, the inter- 
polants may diverge: in the case of polynomial interpolation this is the well-known Runge 



phenomenon 13 , and a similar effect holds for the RBF interpolation, see 6 12 . 
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4 Optimization by expected improvement for analytic 
functions 



In this section we describe an application of Theorem [2] to optimization by expected im- 
provement (EI). Optimization by EI is a kind of stochastic Bayesian optimization popular 



in engineering application 10 . We consider the simplest version of the algorithm with a 
centered Gaussian process and a fixed covariance function. 

Suppose that we are searching for the global minimum of a function / on a segment [a, b}. 
We iteratively sample points xi, x 2 , ■ ■ ■ C [a, b], and evaluate the function / at these points. 
For each n we define the current best result as 

/* = min f(x k ). 

k=l,...,n 

The question is whether /* converges to the global minimum 

/* = min f(x), 

x(z [a,b] 

and how fast if yes. 

In optimization by EI / is assumed to be a realization of a centered Gaussian process 
{£,x}xe{a,b] with a given covariance function G(x, x') = E(^ x ^), and the choice of each x n+ i is 
determined from the history {x^, f(xk)}k=i by maximizing the expectation of improvement 
of the current best result for the process conditioned on the event {£ Xk = f(xk)}k=v 

x n+ i = arg max 3 n (x), 

ic£[a,t>] 



where 



3 n (x) = E(/: -wmX,Z*)\{U = fMK 



In this way, each optimization iteration is reduced to an auxiliary optimization problem 
3 n — > maXj., which can be written in an analytic form and readily solved numerically for 



moderate values of n. See 16 for more details and a bibliography on EI. 

In the sequel we assume that the auxiliary optimization problem is exactly solved at each 
step n. 

A popular choice of covariance function in optimization by EI is the Gaussian function 

G(x, x') = G(x - x') = e- {x ~ x ' )2/2 . (10) 



In 16 , we have proved that the optimization by EI with this covariance function does not 
in general converge to the global optimum for C°° functions /. We prove now that if / is 
analytic in a sufficiently large complex neighborhood of [a, b], then the optimization does 
converge, and moreover with an exponential convergence rate. 

Theorem 3. Consider optimization of a (real-valued) function f on the segment [a,b] by EI 



with covariance function (10). Suppose that f continues analytically to a complex domain 
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T> D [a, b] such that dist([a, b], &D) > p > \b — a\. Then f* converges to the global minimum 
f* of f on [a, b], and 

Proof. Let m n (x) and cr^(x) denote the posterior mean and variance of the process £ x con- 
ditioned on the event {£ Xfc = f(xk)}k=v 

= f{x k )T k=1 ~N{m n {x),al{x)). 

A straightforward computation with Gaussians shows that m n (x) is the interpolation of / 
by the RBF G with the nodes X\, . . . ,x n , i.e. 

m n (x) = I*f(x). 

We then obtain from Theorem |2] that for some constant c 

>b — a 



... . . . /u — U\n 

max \j(x) — m n [x) < c( . 

xe[a,b] v p 7 



'111 



On the other hand, it follows from results proved in |16i] (see Theorem 2 and the example 
immediately below) that in the case of covariance (10) the posterior variance cr^(x) converges 
faster than exponentially to uniformly on [a, b], for any sequence Xi,x 2 , ■ ■ ■ C [a, b]: 



max 0"^(x) = 0(e n ), n — > oo, 

for any e > 0. 

We will prove the theorem by showing that for sufficiently large n 

/b — <2\ n 

- r < 3c(— ) , 



121 



(13) 



where c is from (11). 



Fix n. Since f* +1 < /*, it suffices to prove (13) in the case when 



r n -r> 3c 



b — CL\n 



P 



(14) 



Suppose that (13) does not hold. Then f(x n+ i) — f* > 3c(^) and, by (11), 

b — a\n 



m n (x n+1 ) - f* >2c 



P 



(15) 



Consider now the minimizer x* G [a, b] for which f(x*) = f*. Again using (11), we have 

m n {x )- f < c 

v P 



(16) 



We see by comparing (15) with (16) that the expected value of the function / at x* is 
lower than at x n+ \. By exploiting the smallness of the variance, expressed by (12), we will 
conclude that x* provides a better expected improvement than x n+ \, and thus will reach a 
contradiction with the definition of x n+ i. 
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Lemma 4. For all n and x G [a, b] 



\3 n (x) - (/* - min(/*,m n (x)))| < <r n (x). (17) 



Proof. Let x(s) = /* — min(/*, s). Then we can write the l.h.s. of (17) as 

|E(x(6e;n) - x(m n (x)))|, 

where by £ x . n we denote the conditioned process: £ x . n = £ x \{£,x k — f( x k)}k=v But s i nce 
\x( s i) - x( s 2)\ < \si - s 2 | for all si, s 2 , we have 



|E(xfe;n) - x(m n (x)))| < E|^ ;n - m„(x)| < yE(^ - m n (x)) 2 = a„(x). 

□ 

Applying this lemma to x — x n+ \ and x = x*, we get 

3 n (x n+1 ) < f* - mm(f*,m n (x n+ i)) + a n (x n+1 ), 
3 n (x*) > f* - mm(f*,m n (x*)) - a n (x*), 

which implies 

3 n (x*) - 3 n (x n+ i) > [rmn(f*,m n (x n+1 )) - min(/*, m n (x*))] - (J n (x n+1 ) - a n (x*). 



Inequalities (14), (15) and (16) imply that the expression in brackets here is greater than 

C \l^) ' ^ so ' shanks to (12), a n (x) < f f° r & H & H sufficiently large n and all x £ [a, b], 

in particular for x n+ \ and x*. We conclude that 

3 n (x* )-3 n (x n+1 )>-( ) >0, 

which completes the proof. □ 

We end this section with a brief discussion of the obtained result. Note that it is of course 
a consequence of the strong assumption of analyticity that the convergence is both global 
and exponential (compare with the local exponential convergence of classical gradient-based 
numerical optimization and with the global power law convergence of EI optimization for 
finitely smooth functions [3]). 

Note also that the strong claim of this theorem only pertains to the convergence /* — > /*; 
we have not at all claimed the convergence x n — > x* or f(x n ) — > f*. 

Finally, we remark that one must be careful with practical implementations of the EI algo- 



rithm when used with covariance function (10) and applied to analytic objective functions, as 



the algorithm involves ill-conditioned interpolation matrices and other elements potentially 
sensitive to round-off errors and/or requiring high-precision computations (see [IB]). 
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